-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add llama onnx export & onnxruntime support #975
Conversation
Hi i was trying out obtaining onnx of llama model using optimum library using the command below, transformer version = "4.28.1" the path of model is from hugging face. but i was facing an issue which i was not getting when I working with HuggingFaceM4/tiny-random-LlamaForCausalLM. Framework not specified. Using pt to export to ONNX. Could you help me what might be the issue in this scenario . I |
Me too. Is there any example command to export the LLaMA to fp16 onnx? Thanks! |
@gjain7 The problem is that in decapoda-research/llama-13b-hf the model class specified in the |
@eric8607242 Could you try the following command please? optimum-cli export onnx --model path_to_model --fp16 --optimize O2 output_dir |
@regisss Hi, thanks for your response. It is very helpful! |
@regisss Thanks for the suggestion , it did work with the model you specified (y) Unlike other models llama was giving 3 .onnx files as output , decoder_model_merged.onnx. decoder_model.onnx , decoder_with_past_model.onnx . Along with that decoder_model_merged.onnx_data (48 gb) , decoder_model.onnx_data(48 gb) , decoder_with_past_model.onnx_data (48gb) . Why is it actually giving like this and if i want to proceed to triton which .onnx file should i go with . It would be really helpful if these queries are answered. Thanks |
@gjain7 Quoting @echarlaix here:
So, in your case, I recommend that you use |
@regisss Thank you , it was very useful information .Was able to clear the doubts . |
Hi @regisss :) I'm trying to export TinyLlama-1.1B-intermediate-step-480k-1T to ONNX (both with optimum.onnxruntime and optimum-cli) but there it failed with dimension mismatch errors. Since Llama is supported by onnx export now. Do you mind give some insight about why this llama model cannot be exported? Here's the script and corresponding error: import os
from pathlib import Path
import transformers
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForCausalLM
model = ORTModelForCausalLM.from_pretrained("PY007/TinyLlama-1.1B-intermediate-step-480k-1T", from_transformers=True) The argument `from_transformers` is deprecated, and will be removed in optimum 2.0. Use `export` instead
Framework not specified. Using pt to export to ONNX.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using framework PyTorch: 2.1.0+cu118
Overriding 1 configuration item(s)
- use_cache -> True
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:808: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[-1] > 1:
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:146: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:375: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:382: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:392: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Saving external data to one file...
Using framework PyTorch: 2.1.0+cu118
Overriding 1 configuration item(s)
- use_cache -> True
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python311\Lib\site-packages\optimum\onnxruntime\modeling_ort.py", line 647, in from_pretrained
return super().from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\optimum\modeling_base.py", line 372, in from_pretrained
return from_pretrained_method(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\optimum\onnxruntime\modeling_decoder.py", line 574, in _from_transformers
main_export(
File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\__main__.py", line 505, in main_export
_, onnx_outputs = export_models(
^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 752, in export_models
export(
File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 855, in export
export_output = export_pytorch(
^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 572, in export_pytorch
onnx_export(
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 516, in export
_export(
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 1596, in _export
graph, params_dict, torch_out = _model_to_graph(
^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 1135, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 1011, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 915, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\jit\_trace.py", line 1285, in _get_trace_graph
outs = ONNXTracedModule(
^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\jit\_trace.py", line 133, in forward
graph, out = torch._C._create_graph_by_tracing(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\jit\_trace.py", line 124, in wrapper
outs.append(self.inner(*trace_inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward
result = self.forward(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\model_patcher.py", line 112, in patched_forward
outputs = self.orig_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 1038, in forward
outputs = self.model(
^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward
result = self.forward(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 925, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward
result = self.forward(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 635, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward
result = self.forward(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 365, in forward
key_states = torch.cat([past_key_value[0], key_states], dim=2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 32 but got size 4 for tensor number 1 in the list. |
Hi @xijianlou1, thank you for the report. Can you try on the main branch? This is likely to be the sale as #1399 & to have been fixed if you install from source. We'll have an upcoming release. |
As per title